1. What's PMCSched?

PMCSched is an open-source OS-oriented framework for scheduling algorithms development. Its source code can be found on a GitHub repository, introduced as part of PMCTrack public release v3.0.

PMCSched is a framework for the Linux kernel that enables rapid development of the OS-level support required to create custom scheduling and resource-management schemes on both symmetric and asymmetric multicore systems (AMPs). Unlike other existing frameworks that require patching the Linux kernel to function, PMCSched makes it possible to incorporate new scheduling-related OS-level support in Linux via a kernel module that can be loaded in unmodified kernels, making its adoption easier in production systems. Notably, the main focus of this framework is to simplify the creation of novel scheduling and resource-management strategies that are either implemented entirely in the OS kernel, or require changes in different layers of the system software, so as to benefit from coordinated decisions between the runtime system and the OS scheduler.

PMCSched was born as a continuation of PMCTrack, an OS-oriented performance monitoring tool for Linux. With the PMCSched framework we take PMCTrack’s potential one step further by enabling rapid development of OS support for scheduling and resource management for Linux within a loadable kernel module. The diagram below illustrates the addition of PMCSched into the PMCTrack design.


Addition of PMCSched into PMCTrack


2. Developing an example plugin

PMCSched is built on top of PMCTrack, and so the first step requires installing and building PMCTrack. Instructions for that can be found here. Those instructions are highly encourage to be read, since they include insights on the use of PMCTrack from user space, from the OS scheduler, and PMCTrack monitoring modules.


2.1. Defining the new plugin

Once PMCTrack is installed we can go ahead and start creating our custom scheduling algorithm. A new scheduling or resource management algorithm can be implemented by creating a scheduling plugin. We start by defining our new plugin on PMCSched's main header pmcsched.h. For example, with a new example plugin with ID SCHED_EXAMPLE (added into the enum of policies). Our new plugin has to be included into the array of available schedulers too, in that same header.

We add it to the enum of possible plugins: we define the plugin as extern, as we will use a separate file to define its functions: and finally we include it on the array of possible plugins:

after that, we can start developing our new plugin right away, in a new file example.c. Creating a plugin works similarly to Linux kernel modules, the plugin follows a "contract" and implements a number of functions that every plugin should have, each with specific function attributes, and intended to handle specific events.

The various algorithm-specific operations are invoked from the core part of the scheduling framework when a key scheduling-related event occurs, such as when a threads enters the system, terminates, becomes runnable/non-runnable, or when tick processing is due to update statistics. The framework also provides a set of callbacks to carry out periodic scheduling activations from interrupt (timer) and process (kernel thread) context on each core group separately, thus making it possible to invoke a wide range of blocking and non-blocking scheduling-related kernel API calls, such as those to map a thread to a specific CPU or core group. This modular approach to creating scheduling algorithms re- sembles the one used by scheduling classes (algorithms) inside the Linux kernel, but with a striking advantage: PMCSched scheduling plugins can be bundled in a kernel module that can be loaded on unmodified kernels.

Let's go ahead and prepare the basic events that our scheduling plugin should have logic to react to:

  1. Task becomes active.
  2. Task becomes inactive.
  3. Task exits the CPU.
  4. Periodic kthread calls the plugin.

There are also some events that our plugin could optionally track. An example of some are:

  1. Task forked.
  2. Task migrated.
  3. A defined profiling event takes place.

The minimum required functions along with a policy ID, optional flags and string description. That would go into our new file:

Some of the fields of our plugin are conceived to ease development from user space. PMCSched exposes an entry of the proc/ filesystem that includes information on the available schedulers, their descriptions, and their ids. These configuration files can also be echoed to enable or disable verbose mode. Some places within PMCSched may print meaningful information when the verbose mode is active. In general, PMCSched uses trace_printk() to output insightful debugging information, but the plugins can check active_scheduler_verbose before printing information with printk. Alternatively, plugins can check if DEBUG was defined (see file pmcsched.c).

Once we have defined and added our new plugin, we can now start developing functions for these cases:


2.2 Implementing the logic of the new plugin

We can now start implementing the logic for our example plugin. Let us, for example, implement something that triggers random migrations within groups of applications.

When the task becomes active, function on_active_thread_example() will be called. We can insert it into the list of active threads of the app, as well as of the apps within the current group. We can check if it is a newly created application and inform through the kernel buffer if the DEBUG option is enabled.

When the task becomes inactive, function on_inactive_thread_example() will be called. The logic will resemble the activation case, but inverted. Hence, we have to remove structure from per-application and global lists:

When the periodic kthread is triggered, function sched_kthread_periodic_example() will be called. This is the function that actually implements the scheduling algorithm, using the per-group linked lists and making changes on which threads are allowed to run at a given time. First, we check if there is any stopped threads (since otherwise there is no point in stopping running threads), and then we traverse the list of active applications within the group.

At this point, we can prepare a random migration. For this example, we will grab the first thread of the first app and migrate it to a random group. We will genereate a random group id for this purpose. Finally, let's make sure we are up to date with the most recent migrations: ... and voila! We have a very basic example scheduler plugin finished. We just have to remember to include it into the Makefile of our target architecture. For example, for an Intel system, within src/modules/pmcs/intel-core/Makefile:


3. Leveraging PMCs

Arguably, one of PMCSched's coolest features is its ability to collect information regarding Performance Monitoring Counters (PMCs), using the APIs provided by PMCTrack.

You can configure your plugin to collect certain metrics and events, such as instruction count, cycles, LLC misses and LLC references. This is particularly interesting to profile entering applications.

Let us illustrate how to collect a number of interesting PMCs:
  1. Instructions per cycle (normalized).
  2. LLC accesses per instruction (normalized).
  3. LLC misses per instruction (normalized).
  4. LLC misses per cycle (normalized).
Firstly, we need to prepare the descriptors for the various performance metrics: using a number of metrics and their indexes, that we have to define upfront: Finally, we can prepare the pmcsched_counter_config_t, which is the configuration exposed to PMCTrack. In the example below, we set the profiling mode to TBS_SCHED_MODE (as opposed to event based sampling, with EBS_SCHED_MODE). which we can pass as part of the plugin definition, using field counter_config. We also specify that for every new sample collected, we want PMCSched to call our plugin's function profile_thread_example(). The profiling function can then update global instruction counters from the sample: and use this information to decide, depending on the algorithm, how to classify the application.


4. Publications

Here's a list of the most recent publications related to PMCSched:
  1. Rapid development of OS support with PMCSched for scheduling on asymmetric multicore systems - 20th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms - HeteroPar'22 (at Euro-Par)


5. Contact

You can contact the two main project contributors:
  1. Juan Carlos Sáez Alcaide (jcsaezal at ucm.es)
  2. Carlos Bilbao (cbilbao at ucm.es)

PMCSched: Scheduling algorithms made easy - Online documentation and introduction

Authors: Juan Carlos Sáez Alcaide and Carlos Bilbao

Template Design & Develop by HarnishDesign.

Template - Copyright © 2020 iDocs. All Rights Reserved.